itsonlyamodel

Tibet Snow-Man

Introduction

Snow and Ice cover has been collected in .asc files here by the National Snow and Ice Data Center (NSIDC). One product of interest is the Ice Mapping System (IMS) which takes satalite and field measurements and puts them on a grid. Currently they have resolutions of about 24x24km 4x4km and 1x1 km. Their website is located here. It has been used to show snow and ice coverage across the northern hemisphere. Rutgers has done a great job displaying this coverage, shown here.

For those like me, who want to display, subsections or timeseries, or to use this set in their research, accessing and anylizing the data can be cumbersome. First, of all, these files are all zipped and archived on their FTP site. If you want to grab a subset, you have to download them yourself, and unzip them individually. If you know how to program, you can just automate this, but unfortunatly, the files format changes in the early years, so your automation has to check and see which format the file is using. When you get to the 4x4km resolution, there are some files that contain errors, your automation script may crash if you don't handle these errors accordingly. Finally, when viewing the files, the grid cell areas aren't given. Flatting out the Earth's surface onto a stereographic projection distorts the areas of each grid, so that you don't really know what the areas represent. Shown below is one such file, with snow and ice removed.

In [1]:
from IPython.display import Image
Image(filename='data/dry_planet_24km.png')
Out[1]:

Fortunatly, I have gone through the pain of these issues and released a project called tibet snow-man on github. I focused on the Tibetan Plateau (TP), but this project can work for any region in the northern hemisphere. The project does the following:

  1. parses out the files locally (use somthing like filezilla to download the sets yourself) and flattens them out into a timeseries. Each year is saved in a HDF5 database that is accessed later on.

  2. Filters for a given lat-long square, for example the TP region falls within 25-45 latitude and 65-105 longitude

  3. Estimates the area of each grid cell using provided lat-long files. This step is explained in more detail later on in this notebook.

  4. Sums up snow and ice coverage for a given areas and makes a timeseries.

  5. Generates images showing snow cover on a map projection

The remainder of this article will be used to desribe these functionalities. With any luck, this notebook can help those who would like to create videos or timeseries similar to those shown below.

IMAGE ALT TEXT HERE

In [2]:
Image(filename='data/ts-compare.png')
Out[2]:

First I need to import the following libraries

In [3]:
from generate_grid_and_area import grid_and_area
import os
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
import pandas as pd
%matplotlib inline
import datetime
import pdb
import seaborn as sns
from matplotlib import rc
import matplotlib.ticker as mtick
rc('text', usetex=True)
rc('font', family='monospace', size = 20)

2. Filtering areas

The beauty of a dataframe is that it can be filtered easily. grid_and_area class contains a dataframe object that is filtered in the method reduceLatLong().

First we instantiate the object grid_maker_filtered. Defines the dataframe object and creates a multiindex comprised of row and column number.

In [4]:
home_dir = os.getcwd()
data_dir = home_dir+'/data/'
no_snow_planet_name = 'dry_planet_24km.asc'
lat_grid_filename = 'imslat_24km.bin'
lon_grid_filename = 'imslon_24km.bin'
grid_size = 1024
lat_long_coords_filtered = {'lower_lat':35,'upper_lat':36,'lower_long':85,'upper_long':86} #set as lower and upper bounds for lat and long
grid_maker_filtered = grid_and_area(lat_long_coords_filtered,data_dir,no_snow_planet_name,grid_size)

Next the lat-long files are added

In [5]:
grid_maker_filtered.addLatLong(lat_grid_filename,lon_grid_filename)
print('grid maker shape: {}'.format(grid_maker_filtered.df.shape))
grid_maker_filtered.df.head(5)
grid maker shape: (1048576, 2)
Out[5]:
lat long
row col
0 0 NaN NaN
1 NaN NaN
2 NaN NaN
3 NaN NaN
4 NaN NaN

Note that there is no values for the corner of the map. There are no earth to report, so they are set as NaN. Next step reduces the dataframe to the region specified in lat_long_coords_filtered. Additionally the id column is added, giving each row a unique ID.

In [6]:
grid_maker_filtered.reduceLatLong()
print('grid maker shape: {}'.format(grid_maker_filtered.df.shape))
grid_maker_filtered.df.head(5)
grid maker shape: (24, 3)
Out[6]:
lat long id
col row
445 760 35.840794 85.096535 0
446 760 35.887020 85.312492 1
447 760 35.932606 85.528885 2
448 760 35.977543 85.745689 3
445 761 35.666016 85.153740 4

3. Areas

Latitude and Longitude coordinates are given for each point on the grid cell. Currently the lat-long files for the 1x1km grid have not been released yet.

The class grid_and_area defines creates an instance called grid_maker. Once initialized, addLatLong, reduceLatLong, makeNoSnowMap, and addAreas methods are called to create a dataframe object for the Tibetan Plateau.

In [7]:
home_dir = os.getcwd()
data_dir = home_dir+'/data/'
no_snow_planet_name = 'dry_planet_24km.asc'
lat_grid_filename = 'imslat_24km.bin'
lon_grid_filename = 'imslon_24km.bin'
lat_long_area_filename = 'lat_long_area.csv'
lat_long_coords = {'lower_lat':25,'upper_lat':45,'lower_long':65,'upper_long':105} #set as lower and upper bounds for lat and long
grid_size = 1024

grid_maker = grid_and_area(lat_long_coords,data_dir,no_snow_planet_name,grid_size)
grid_maker.addLatLong(lat_grid_filename,lon_grid_filename)
grid_maker.reduceLatLong()
grid_maker.makeNoSnowMap()

#tibet falls approximatly in this region.
grid_maker.addAreas()

df_whole = grid_maker.df
df_whole.reset_index(level = df_whole.index.names, inplace=True)
In [8]:
df_whole.head(10)
Out[8]:
col row lat long id noSnowMap noSnowMapRBG centroid_lat centroid_long x y area
0 392 683 44.917850 65.165344 0 2 [0, 128, 0] 44.8932 65.3566 713974.450086 3260701.648276 468.906395
1 391 684 44.647007 65.097160 1 2 [0, 128, 0] 44.6226 65.2872 701686.887868 3232339.307802 467.064901
2 392 684 44.757946 65.321815 2 2 [0, 128, 0] 44.733 65.5123 721784.538148 3240670.591263 467.814964
3 393 684 44.868347 65.547699 3 2 [0, 128, 0] 44.8428 65.7385 741915.649549 3248972.270844 468.568645
4 394 684 44.978203 65.774811 4 2 [0, 128, 0] 44.9521 65.966 762079.971665 3257244.208015 469.318381
5 390 685 44.376633 65.029869 5 2 [0, 128, 0] 44.3524 65.2187 689462.592713 3204012.684450 465.222722
6 391 685 44.487549 65.252861 6 2 [0, 128, 0] 44.4628 65.4421 709504.144434 3212348.420874 465.981307
7 392 685 44.597927 65.477066 7 2 [0, 128, 0] 44.5726 65.6667 729579.023035 3220654.702331 466.728680
8 393 685 44.707767 65.702477 8 2 [0, 128, 0] 44.6819 65.8925 749687.296819 3228931.422765 467.480009
9 394 685 44.817059 65.929115 9 2 [0, 128, 0] 44.7907 66.1195 769828.431329 3237178.497176 468.222499

Check areas with analitical results

The areas created are verified to be correct by adding them all up and compairing their collective area with an exact solution.

The Earth is modeled as a sphere of radius $R = 6371 \ km$ The haversine function calculates the great circle distance between two lat-long points in degrees.

$$ \Delta \phi = \phi_{2} - \phi_{1} \\ \Delta \lambda = \lambda_2- \lambda_1 \\ a = sin(\frac{\Delta \phi}{2})^{2} + cos(\phi_1) * cos(\phi_2) * sin(\frac{\Delta \lambda}{2})^{2} \\ c = 2 * arcsin(\sqrt(a)) $$

Where $\phi$ represnts latitude and $\lambda$ represents longitude. and c is the great circle length in degrees.

Next a semiperimiter area is calculated

$$ s = c + \pi + (\phi_{1} + \phi_{2})^{\frac{1}{2}} $$

Finally The area of a spherical triangle is given by the l'Hullier formula

$$ b = \sqrt{ tan(\frac{s}{2}) tan(\frac{s-d}{2}) tan(\frac{s - \pi/2 + \phi_{1}}{2}) tan(\frac{s - \pi/2 + \phi_{2}}{2})} \\ \Delta = 4.0 * arctan(b) $$

where $\Delta$ is the radiall area for a spherical triangle.

Treating the longitude points $\lambda_1 = 65^{\circ}$ $\lambda_2 = 105^{\circ}$ and setting $\phi_{1} = \phi_{2}$. Two different spherical triangle areas are obtained for $\phi_top = 45^{\circ}$ $\phi_{bottom} = 25^{\circ}$, $\Delta_{top}$ and $\Delta_{bottom}$ are obtained.

The area of Tibet in $km^2$ is given to be

$$ A_{tibet} = R^2(\Delta_{top} - \Delta_{bottom}) $$

$A_{tibet}$ is treated as the exact solution and compared to the sum of areas in our data frame

In [9]:
lat_long_area_filename_24 = 'lat_long_centroids_area_24km.csv'
df_24km = pd.read_csv(data_dir+lat_long_area_filename_24, index_col=(0,1))
lat_long_area_filename_4 = 'lat_long_centroids_area_4km.csv'
df_4km = pd.read_csv(data_dir+lat_long_area_filename_4, index_col=(0,1))



from math import radians, cos, sin, asin, sqrt

def haversine_formula(lat1, lat2,lon1, lon2):
    """
    Calculate the great circle distance between two points 
    on the earth (specified in decimal degrees)
    """
    # haversine formula 
    dlon = lon2 - lon1 
    dlat = lat2 - lat1 
    a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
    c = 2 * asin(sqrt(a)) 
    return c


def semi_perimeter(d, phi_1, phi_2):
    s = (d  + np.pi  + (phi_1 + phi_2) )* .5
    return s

def lhuilier(s, d, phi_1, phi_2): #uses unit sphere R=1

    inner_sq = np.sqrt( np.tan( 0.5*s ) * 
                        np.tan( 0.5*(s-d) ) * 
                        np.tan( 0.5*(s-(np.pi/2 + phi_1)) ) * 
                        np.tan( 0.5*(s-(np.pi/2 + phi_2)) ) )
    ans = 4.0 * np.arctan(inner_sq)
    return ans


bottom_phi = 25
top_phi = 45
left_lambda = 65
right_lambda = 105

# convert decimal degrees to radians 
bottom_phi, top_phi, left_lambda, right_lambda = map(radians, [bottom_phi, top_phi, left_lambda, right_lambda])

R = 6371 #km

d_bottom = haversine_formula(bottom_phi, bottom_phi,left_lambda, right_lambda)
print('bottom length (km): {}'.format(R * d_bottom))
s_bottom = semi_perimeter(d_bottom, bottom_phi, bottom_phi)
E_bottom = lhuilier(s_bottom, d_bottom, bottom_phi, bottom_phi)


d_top = haversine_formula(top_phi, top_phi,left_lambda, right_lambda)
print('top length (km): {}'.format(R * d_top))
s_top = semi_perimeter(d_top, top_phi, top_phi)
E_top = lhuilier(s_top, d_top,top_phi, top_phi)



tibet_area = R**2 * (E_top - E_bottom)

print('tibet area via 24x24 km grid files: {}'.format(df_24km['area'].sum()))
print('tibet area via Haversine formula: {}'.format(tibet_area))
perc_dif_24 = 100*(df_24km['area'].sum()-tibet_area)/tibet_area
print('percent difference: {0}'.format(perc_dif_24))

print('tibet area via 4x4 km grid files: {}'.format(df_4km['area'].sum()))
print('tibet area via Haversine formula: {}'.format(tibet_area))
perc_dif_4 = 100*(df_4km['area'].sum()-tibet_area)/tibet_area
print('percent difference: {0}'.format(perc_dif_4))
bottom length (km): 4015.86152343
top length (km): 3112.44504008
tibet area via 24x24 km grid files: 8062815.0957
tibet area via Haversine formula: 8059061.81949
percent difference: 0.0465721232846
tibet area via 4x4 km grid files: 8061716.73857
tibet area via Haversine formula: 8059061.81949
percent difference: 0.0329432773019

Not too shabby overall.

Plot region

The tibetian region is shown in white below. Rows and columns represnt the data file's indicies.

In [10]:
#%% Figure1
#zoom-in-area
zoom = (600,900,200,600)
backdrop = grid_maker.rbg_no_snow_matrix[zoom[0]:zoom[1],zoom[2]:zoom[3],:]
#plot of points of interest

llm_x=df_whole['col'].values-zoom[2]
llm_y=df_whole['row'].values-zoom[0]

fig_size = (12,8)
plt.rcParams["figure.figsize"] = fig_size

# Two subplots, the axes array is 1-d
#fig1 = plt.figure(1, figsize = (8,5))
fig1 = plt.figure(1)

label_fontsize = 14
ax1 = plt.axes()
plt.setp(ax1.get_xticklabels(), rotation='vertical',fontsize=12)
plt.setp(ax1.get_yticklabels(), fontsize=12)


#plt.figure(1, figsize=(12,5))
#ax1 = plt.subplot(1,2,1)
ax1.imshow(backdrop)
ax1.set_title('Plot of tibetian plateau grid',fontsize=label_fontsize)
ax1.set_xlabel('Column',fontsize=label_fontsize)
ax1.set_ylabel('Row',fontsize=label_fontsize)
ax1.set_xticklabels([200,250,300,350,400,450,500,550,600])
ax1.set_yticklabels([600,650,700,750,800,850,900])

ax1.scatter(llm_x,llm_y, c='w', s=10, marker = 'o',facecolor='0.5', lw = 0)


ax1.axis([0,400,300,0])
ax1.grid(True)


plt.show()

Show basemap projection

Basemap is relied upon to convert latitude-longitude coordinates from degrees to meters, referenced by some arbitrary point. See Basemap's Website. The code below creates a Lambert Azimuthal Equal Area projection object m, it also sets a blue marble background. The center point is plotted for illustration.

In [11]:
#make map
plt.figure(2)
long_center, lat_center = ((lat_long_coords['upper_long']-lat_long_coords['lower_long'])/2+lat_long_coords['lower_long'],(lat_long_coords['upper_lat']-lat_long_coords['lower_lat'])/2+lat_long_coords['lower_lat'])

m = Basemap(projection='laea',
            width = 4500000,
            height = 4000000,
            resolution='c',lat_0=lat_center,lon_0=long_center)
x, y = m(long_center, lat_center) # this function converts degrees to meters on this reference map
print('center point x: {} meters'.format(x))
print('center point y: {} meters'.format(y))
m.plot(x, y, marker='D',color='cyan')
m.bluemarble()
plt.show()
center point x: 2250000.0 meters
center point y: 2000000.0 meters

Interactive Tibet Areas

Areas over the TP vary substantially. As it turns out, there is quite a substantial differenece in areas when travelling along latitude.

Plotly is used to render interactive plots. To export and share, you have to pay an annual subscription, these images may not show up on this blog but You can install plotly on your own environment and run this yourself though! Sorry folks. There is a static image below to show the areas on a color scale.

In [12]:
Image(filename='data/areas_of_tibet_grid_2.png')
Out[12]:
In [13]:
from plotly.offline import download_plotlyjs, init_notebook_mode, iplot
import plotly.graph_objs as go
init_notebook_mode()

def make_whole_area_text(X):
    return 'Area: %s km^2\
    
row:
%s\
column:
%s\
lat:
%s\
long:
%s\
id:
%s'\ % (round(X['area'],2), X['row'], X['col'],round(X['lat'],3),round(X['long'],3),X['id']) scl = [ [0,"rgb(5, 10, 172)"],[0.35,"rgb(40, 60, 190)"],[0.5,"rgb(70, 100, 245)"],\ [0.6,"rgb(90, 120, 245)"],[0.7,"rgb(106, 137, 247)"],[0,"rgb(220, 220, 220)"] ] trace = go.Scattergl( x = df_whole['long'], y = df_whole['lat'], text =df_whole.apply(lambda x: make_whole_area_text(x), axis=1), mode = 'markers', marker = dict( size = 8, opacity = 0.8, reversescale = True, autocolorscale = False, symbol = 'dot', line = dict( width=0, color='rgba(102, 102, 102)'), colorscale = scl, cmin = df_whole['area'].min(), color = df_whole['area'], cmax = df_whole['area'].max(), colorbar=dict( title="Km^2") ) ) data = [trace] layout = dict( title = 'Areas of Tibet Grid', colorbar = True, xaxis=dict( title='Longitude', titlefont=dict( family='Courier New, monospace', size=20 ), ), yaxis=dict( title='Latitude', titlefont=dict( family='Courier New, monospace', size=20 ), ), ) fig = dict( data=data, layout=layout ) plot_url = iplot(fig, validate=False)

Get max-min ratio

The range varies quite a lot, and is a source of error. Moral of the story: Each grid area must be calculated separatly. Don't assumet that each grid is $24x24km^2$ or $4x4 km^2$

In [14]:
df_whole['area'].max()/df_whole['area'].min()
Out[14]:
1.4398482062172655

Show zoomed in section of grid

This section zooms into a smallar region, showing how areas were calculated. As before, the non interactive image is provided below.

Centroids between for lat-long points are used to create the borders of a given point's domain. For example, a given point $p_{i,j}$ located at row i column j surrounding corners are calculated.

$$ top \ left \ centroid_{i,j} = \frac{\Sigma( {p_{i-1,j-1}, p_{i,j-1} , p_{i,j} , p_{i-1,j} } )}{4} \\ top\ right\ centroid_{i,j} = \frac{\Sigma( {p_{i-1,j}, p_{i-1,j+1} , p_{i,j+1} , p_{i,j} } )}{4} \\ bottom\ right\ centroid_{i,j} = \frac{\Sigma( {p_{i,j}, p_{i,j+1} , p_{i+1,j+1} , p_{i+1,j} } )}{4} \\ bottom\ left\ centroid_{i,j} = \frac{\Sigma( {p_{i-1,j}, p_{i,j} , p_{i+1,j} , p_{i-1,j+1} } )}{4} \\ $$

The points are then projected onto a flat surface via the Lambert Azimuthal Equal Area projection. The latitude-longitude coordinates are converted into meters relative to an arbitrary reference point. Treating these points as corners of a polygon, the areas were found using the discrete version of Green's Theorem, also known as the shoestring formula shown below.

$$ A_{ij} = \frac{1}{2} \vert \sum\limits_{k=1}^{3}x_{k}y_{i+1} + x_{n}y_{1} - \sum\limits_{k=1}^{3}x_{k+1}y_{k} - x_{1}y_{n} \vert $$

This was done for the whole dataframe, creating a grid shown below.

In [15]:
Image(filename='data/areas_of_sub_section_i.png')
Out[15]:
In [16]:
lat_long_coords_zoomed = {'lower_lat':35,'upper_lat':36,'lower_long':85,'upper_long':86} #set as lower and upper bounds for lat and long
grid_maker_zoomed = grid_and_area(lat_long_coords_zoomed,data_dir,no_snow_planet_name,grid_size)
grid_maker_zoomed.addLatLong(lat_grid_filename,lon_grid_filename)
grid_maker_zoomed.reduceLatLong()
grid_maker_zoomed.makeNoSnowMap()

grid_maker_zoomed.addAreas()
df_zoomed = grid_maker_zoomed.df

df_zoomed.reset_index(level = df_zoomed.index.names, inplace=True)
In [17]:
from plotly.offline import download_plotlyjs, init_notebook_mode, iplot
from plotly.graph_objs import *
init_notebook_mode()

scl = [ [0,"rgb(5, 10, 172)"],[0.35,"rgb(40, 60, 190)"],[0.5,"rgb(70, 100, 245)"],\
    [0.6,"rgb(90, 120, 245)"],[0.7,"rgb(106, 137, 247)"],[0,"rgb(220, 220, 220)"] ]

def make_area_text(X):
    return 'Area: %s km^2\
    
row:
%s\
column:
%s\
id:
%s'\ % (round(X['area'],2), X['row'], X['col'], X['id']) def make_centroids_text(X): return 'row: %s\
column:
%s\
id:
%s'\ % (X['row'], X['col'], X['id']) centroid_df = df_zoomed[df_zoomed['row'] < 765] area_df = centroid_df[centroid_df['row'] > 760] centroid_df = centroid_df[ centroid_df['col'] > 444] area_df = area_df[ area_df['col'] > 445] trace0 = Scatter( name = 'grid points', y=area_df['lat'], x=area_df['long'], mode = 'markers', text =df_zoomed.apply(lambda x: make_area_text(x), axis=1), marker = dict( size = 8, opacity = 0.8, reversescale = True, autocolorscale = False, #showscale=True, showscale=False, symbol = 'dot', line = dict( width=0, color='rgba(102, 102, 102)'), colorscale = scl, cmin = df_zoomed['area'].min(), color = df_zoomed['area'], cmax = df_zoomed['area'].max(), colorbar=dict( title="Km^2") ) ) trace1 = Scatter( name = 'cell corners', y=centroid_df['centroid_lat'], x=centroid_df['centroid_long'], #text = df[ ['row', 'col']].values, #text = df['row'], text = centroid_df.apply(lambda x: make_centroids_text(x), axis = 1), mode = 'markers', #hoverinfo = "none", ) def get_centroid_coords(area_indexes, centroid_df): coords = [] df = centroid_df.set_index(['col', 'row'])[ ['centroid_lat', 'centroid_long'] ] for idx in area_indexes: col, row = idx item = [(df.ix[ (col-1,row)]['centroid_long'], df.ix[ (col-1,row)]['centroid_lat']), (df.ix[ (col,row)]['centroid_long'], df.ix[ (col,row)]['centroid_lat']), (df.ix[ (col,row-1)]['centroid_long'], df.ix[ (col,row-1)]['centroid_lat']), (df.ix[ (col-1,row-1)]['centroid_long'], df.ix[ (col-1,row-1)]['centroid_lat'])] coords.append(item) coords = np.array(coords) return coords #get col and row indexes from area_df area_indexes = area_df[['col', 'row']].values #return a list containing the centroid coordinates centroid_polygons = get_centroid_coords(area_indexes, centroid_df) #make polygon traces out of each centroid_polygons def make_scatter(x,y): return Scatter( x=np.append(x,x[0]), y=np.append(y,y[0]), name = 'cell boundary', mode='lines', line=dict( color='rgba(143, 19, 131, .5)', ), fill='none', fillcolor = "none", showlegend=False, hoverinfo = "none" ) traces= list(map( lambda pt: make_scatter(pt[:,0], pt[:,1]), centroid_polygons)) traces[0]['showlegend'] = False traces.append(trace0) traces.append(trace1) data = Data(traces) layout = dict( title='Section of map grid showing areas', xaxis=dict( title='Longitude', titlefont=dict( family='Courier New, monospace', size=18 ), ), yaxis=dict( title='Latitude', titlefont=dict( family='Courier New, monospace', size=18 ), ), showlegend = False, legend=dict( x = .82, y = 1), #orientation= "h"), hovermode='closest' ) fig = { 'data': data, 'layout': layout, } iplot(fig, filename = 'grid')

Showing Snow Annotations

With areas found, we can now see how much snow and ice coverage there is. Recall the dataframe

In [18]:
df_zoomed.head(3)
Out[18]:
col row lat long id noSnowMap noSnowMapRBG centroid_lat centroid_long x y area
0 445 760 35.840794 85.096535 0 2 [0, 128, 0] 35.7765 85.2329 2271013.493266 2086361.871301 404.981561
1 446 760 35.887020 85.312492 1 2 [0, 128, 0] 35.8222 85.4487 2290458.082799 2091519.305438 405.318659
2 447 760 35.932606 85.528885 2 2 [0, 128, 0] 35.8674 85.6649 2309918.669439 2096647.812303 405.646980

The column noSnowMap contains information taken directly from the .asc files. The table below illustrates the meaning of these digits

asc file marker meaning
0 out of range
1 water
2 land
3 ice
4 snow

When parsing the .asc files, numbers are translated from this table to either 1 and 0, signifying there being snow or no snow respectively.

dataframe marker meaning
0 no snow or ice present
1 snow or ice present

Shown in the illustration below for a given day, The top 5 cells are shown to be covered in snow. They are marked with a 1 and colored white The bottom are marked with a zero and are shown in green.

In [19]:
Image(filename='data/areas_of_sub_section_iii.png')
Out[19]:
In [20]:
def make_annotation(x,y,snow_bool):

    if snow_bool:
        text = '1'
    else:
        text = '0'

    annotation = dict(
            x=x-.025,
            y=y+.025,
            xref='x',
            yref='y',
            text=text,
            showarrow=False,
            font=dict(
                family='sans serif',
                size=25,)
        )
    return annotation

fig2 = fig

fig2['layout']['colorbar'] = False,
snow_bools = [True] * 5 + [False] * 15   
annotations = list(map( lambda pt: make_annotation(pt[0], pt[1], False), area_df[['long', 'lat']].values))
fig2['layout']['annotations'] = annotations


def make_poly(x,y, snow_bool):
     # filled Polygon
    line_color = 'rgba(0, 0, 0,1)'
    if snow_bool:
        color = 'rgba(5,5,5,1)'
    else:
        color = 'rgba(0,128,0,1)'
    
    shape = {
        'type': 'path',
        'path': ' M'+str(x[0])+','+str(y[0])+'L'+str(x[1])+','+str(y[1])+'L'+str(x[2])+','+str(y[2])+'L'+str(x[3])+','+str(y[3])+'Z',
        'fillcolor': color,
        'line': {
            'color': line_color,
        },
    }
    
    return shape

def get_centroid_coords(area_indexes, centroid_df):
    coords = []    
    snow_bool = [True] * 10 + [False] * 10
    df = centroid_df.set_index(['col', 'row'])[ ['centroid_lat', 'centroid_long'] ]
    for idx in area_indexes:
        col, row = idx
        item = [(df.ix[ (col-1,row)]['centroid_lat'], df.ix[ (col-1,row)]['centroid_long']),
                (df.ix[ (col,row)]['centroid_lat'], df.ix[ (col,row)]['centroid_long']),
                (df.ix[ (col,row-1)]['centroid_lat'], df.ix[ (col,row-1)]['centroid_long']),
                (df.ix[ (col-1,row-1)]['centroid_lat'], df.ix[ (col-1,row-1)]['centroid_long'])]
        coords.append(item)
    coords = np.array(coords)  
    return coords

#get col and row indexes from area_df
area_indexes = area_df[['col', 'row']].values
#return a list containing the centroid coordinates
centroid_polygons = get_centroid_coords(area_indexes, centroid_df)


poly_traces = list(map( lambda pt: make_poly(pt[:,1], pt[:,0], False), centroid_polygons))
                
for i, snow_bool in enumerate(snow_bools):
    if snow_bool:
        poly_traces[i]['fillcolor'] = 'rgba(255,255,255,1)'
        poly_traces[i]['line']['color'] = 'rgba(0,0,0,1)'
        annotations[i]['text'] = '1'

fig2['layout']['annotations'] = annotations
fig2['layout']['shapes'] = poly_traces
#fig['layout'].update(traces=traces)
iplot(fig2, filename = 'grid',validate=False)

4. Comparisons betweent the timeseries

Get files

In [21]:
equal_area_filename =  os.getcwd()+'/compare_timeseries/24_km_false_coverage.csv'
df_ea = pd.read_csv(equal_area_filename)
df_ea.rename(index=str, columns={u'perc coverage': '24_perc_ea', u'coverage (km^2)': '24km_cov_ea'}, inplace = True)
df_ea.index = pd.to_datetime(df_ea['timestamp'], format='%Y-%m-%d')
df_ea = df_ea[df_ea['24km_cov_ea'] != 0] #remove missing data
df_ea.drop(pd.Timestamp('2014-12-03'), inplace=True) #remove outlier
df_ea.drop(pd.Timestamp('2014-12-04'), inplace=True) #remove outlier

filename_24km = os.getcwd()+'/compare_timeseries/24_km.csv'
df_24 = pd.read_csv(filename_24km)
df_24.rename(index=str, columns={u'perc coverage': '24_perc', u'coverage (km^2)': '24km_cov'}, inplace = True)
df_24.index = pd.to_datetime(df_24['timestamp'], format='%Y-%m-%d')
df_24 = df_24[df_24['24km_cov'] != 0]
df_24.drop(pd.Timestamp('2014-12-03'), inplace=True) #remove outlier
df_24.drop(pd.Timestamp('2014-12-04'), inplace=True) #remove outlier

filename_4km = os.getcwd()+'/compare_timeseries/4_km.csv'
df_4 = pd.read_csv(filename_4km)
df_4.rename(index=str, columns={'perc coverage': u'4_perc', u'coverage (km^2)': '4km_cov'}, inplace=True)
df_4.index = pd.to_datetime(df_4['timestamp'], format='%Y-%m-%d')
df_4 = df_4[df_4['4km_cov'] != 0] #remove missing data
df_4.drop(pd.Timestamp('2005-02-10'), inplace=True) #remove outlier
In [22]:
%pylab inline
pylab.rcParams['figure.figsize'] = (10, 6)

colors = ["windows blue", "amber", "greyish", "faded green", "dusty purple"]
sns.set_palette(sns.xkcd_palette(colors))

#plot 24 and 4 grids side by side
df_new = pd.concat([df_24,df_4], axis = 1, join = 'inner')

fig1, axes1 = plt.subplots(nrows=2, ncols=1, sharex=True)
df_4['4km_cov'].plot(ax=axes1[0], linewidth = 2)
df_24['24km_cov'].plot(ax=axes1[0], linewidth = 1)
axes1[0].set_title('Timeseries of two resolutions')
axes1[0].set_ylabel(r'Snow coverage, $km^{2}$')
axes1[0].yaxis.set_major_formatter(mtick.FormatStrFormatter('%.e'))
axes1[0].set_ylim([df_4['4km_cov'].min(),df_4['4km_cov'].max()*1.1])
axes1[0].set_xlabel(r'Date')
axes1[0].legend([r'4x4 $km^{2}$',r'24x24 $km^{2}$'], loc=2)

right_ax1 = axes1[0].twinx()
right_ax1.set_ylim([df_4['4_perc'].min()*100,df_4['4_perc'].max()*100*1.1])
right_ax1.set_ylabel('\%')
right_ax1.grid(False)


tibet_area = 8059061.81949 #calculated via haversine

resolution_perc_diff = 100 * np.divide(np.subtract(df_new['4km_cov'].values,df_new['24km_cov'].values),df_new['24km_cov'].values)
resolution_diff = np.subtract(df_new['4km_cov'].values,df_new['24km_cov'].values)
percent_coverage_difference = 100 * np.divide(np.subtract(df_new['4km_cov'].values,df_new['24km_cov'].values),tibet_area)

df_new['perc_diff'] = percent_coverage_difference
df_new['perc_diff'].plot(ax=axes1[1])
axes1[1].set_title('Percent coverage difference between 4x4 and 24x24 $km^{2}$')
axes1[1].set_ylabel(r'\%')
axes1[1].set_xlabel(r'Date')
Populating the interactive namespace from numpy and matplotlib
WARNING: pylab import has clobbered these variables: ['cos', 'trace', 'Figure', 'radians', 'sqrt', 'rc', 'sin', 'Annotation']
`%matplotlib` prevents importing * from pylab and numpy
Out[22]:
In [23]:
print('max resolution percent difference: {}'.format(percent_coverage_difference.max()))
print('min resolution percent difference: {}'.format(percent_coverage_difference.min()))
max resolution percent difference: 6.24520210433
min resolution percent difference: -2.33613673213
In [24]:
#plot old and new

df_ea_and_new = pd.concat([df_ea, df_24], axis = 1, join = 'inner')
fig2, axes2 = plt.subplots(nrows=2, ncols=1, sharex=True)
df_ea_and_new[['24km_cov_ea','24km_cov']].plot(ax=axes2[0])
axes2[0].set_title('Comparison of timeseries')
axes2[0].set_ylabel(r'Snow coverage, $km^{2}$')
axes2[0].yaxis.set_major_formatter(mtick.FormatStrFormatter('%.e'))
axes2[0].set_ylim([df_ea_and_new['24km_cov_ea'].min(),df_ea_and_new['24km_cov_ea'].max()*1.1])
axes2[0].set_xlabel(r'Date')
axes2[0].legend(['assuming grid = 24x24 $km^{2}$','shoestring formula'], loc=2)

right_ax2 = axes2[0].twinx()
df_ea_and_new[['24_perc_ea','24_perc']].plot(ax=right_ax2)
right_ax2.set_ylabel('\%')
right_ax2.set_ylim([df_ea_and_new['24km_cov_ea'].min()/tibet_area*100,df_ea_and_new['24km_cov_ea'].max()/tibet_area*100*1.1])
right_ax2.grid(False)
right_ax2.legend_.remove()

perc_diff = 100 * np.divide( np.subtract(df_ea_and_new['24km_cov_ea'].values, df_ea_and_new['24km_cov'].values) ,df_ea_and_new['24km_cov'].values)
df_ea_and_new['perc_diff'] = perc_diff
df_ea_and_new['perc_diff'].plot(ax=axes2[1])
axes2[1].set_title('Difference between uniform 24x24 $km^{2}$ assumption vs current method')
axes2[1].set_ylabel(r'\%')
axes2[1].set_xlabel(r'Date')
Out[24]:
In [25]:
print('max method percent difference: {}'.format(df_ea_and_new['perc_diff'].max()))
print('min method percent difference: {}'.format(df_ea_and_new['perc_diff'].min()))
max method percent difference: 54.7943594969
min method percent difference: 32.2199003137

Conclusion

So there you have it. This toolkit can be used for researchers and climate scientists, or it can be used by teachers for educational purposes. This is my first python project, so I'm open to suggestions and improvements on my code. Feel free to send a message. I hope that you may find this usefull. Thanks for reading!